Fighting with the Sparsity of Synonymy Dictionaries

نویسندگان

  • Dmitry Ustalov
  • Mikhail Chernoskutov
  • Christian Biemann
  • Alexander Panchenko
چکیده

Graph-based synset induction methods, such as MaxMax and Watset, induce synsets by performing a global clustering of a synonymy graph. However, such methods are sensitive to the structure of the input synonymy graph: sparseness of the input dictionary can substantially reduce the quality of the extracted synsets. In this paper, we propose two different approaches designed to alleviate the incompleteness of the input dictionaries. The first one performs a pre-processing of the graph by adding missing edges, while the second one performs a post-processing by merging similar synset clusters. We evaluate these approaches on two datasets for the Russian language and discuss their impact on the performance of synset induction methods. Finally, we perform an extensive error analysis of each approach and discuss prominent alternative methods for coping with the problem of sparsity of the synonymy dictionaries.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A NOVEL FUZZY-BASED SIMILARITY MEASURE FOR COLLABORATIVE FILTERING TO ALLEVIATE THE SPARSITY PROBLEM

Memory-based collaborative filtering is the most popular approach to build recommender systems. Despite its success in many applications, it still suffers from several major limitations, including data sparsity. Sparse data affect the quality of the user similarity measurement and consequently the quality of the recommender system. In this paper, we propose a novel user similarity measure based...

متن کامل

Voice-based Age and Gender Recognition using Training Generative Sparse Model

Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...

متن کامل

Identity of the previously unrecognized Chetogena flaviceps and its synonymy with C. scutellaris (Diptera: Tachinidae)

نمونه‌ی تایپ ناشناخته‌ی Chetogena flaviceps (Bigot) مورد مطالعه و هم‌نامی آن با گونه‌یChetogena scutellaris Wulp مورد تایید قرار گرفت. دستگاه جنسی نمونه‌ی تایپ ترسیم گردیده و توصیف مجدد گونه‌یC. scutellaris و صفات تشخیص حشره نر از سایر گونه‌ها آورده شده است.

متن کامل

Cross - level synonymy

The word “synonymy” is used here in a very broad and untraditional sense; it is not restricted to lexical linguistic signs, i.e., words, but covers also relations of equivalence between units below the word level, i.e., morphemes and non-phonemic signs. For instance, the ending -en in oxen can be regarded as synonymous with the standard ending -s in cows . In my view, semantic description could...

متن کامل

Learning Hierarchical and Topographic Dictionaries with Structured Sparsity

Recent work in signal processing and statistics have focused on defining new regularization functions, which not only induce sparsity of the solution, but also take into account the structure of the problem. We present in this paper a class of convex penalties introduced in the machine learning community, which take the form of a sum of l2and l∞-norms over groups of variables. They extend the c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1708.09234  شماره 

صفحات  -

تاریخ انتشار 2017